Skip to content

feat(core): implement signeddata#273

Open
iamquang95 wants to merge 19 commits intomainfrom
iamquang95/core/signeddata
Open

feat(core): implement signeddata#273
iamquang95 wants to merge 19 commits intomainfrom
iamquang95/core/signeddata

Conversation

@iamquang95
Copy link
Collaborator

@iamquang95 iamquang95 commented Mar 6, 2026

Should fix: #250

1. New core::signeddata module

  • Added crates/core/src/signeddata.rs (large new module).
  • Implements:
    • Signed-data wrappers
    • Versioned payload handling
    • Serde (JSON) encode/decode paths
    • Signature/message-root helpers
    • Typed error surface (SignedDataError)

2. Expanded ETH2 API spec/model coverage

  • Added/expanded many spec types under:
    • crates/eth2api/src/spec/
  • Includes Phase0 → Fulu structures, plus:
    • serde helper utilities
    • SSZ helper types
    • version wrappers in crates/eth2api/src/versioned.rs

3. eth2util type/version bridge

  • Added crates/eth2util/src/types.rs
  • Provides:
    • Data/builder version conversions
    • SignedEpoch type + serde/tree-hash behavior

4. Tests and golden vectors

  • Added crates/core/testdata/signed_data_golden.json
  • Added/expanded fixtures and parameterized tests (test-case) for parity and serde/root checks

@iamquang95 iamquang95 marked this pull request as ready for review March 9, 2026 17:23
Copy link
Collaborator

@emlautarom1 emlautarom1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's extremely hard to validate the PR given its length, so I'll share my general concerns here before merging:

  • Some golden tests or fixtures have no clear sources. From where are we getting these?
  • We don't have any integration tests with a real Beacon client and adding them is non trivial.
  • There are seveal data types that look to be duplicated or even triplicated.
  • We're adding yet annother SSZ library into the mix, so now we have 3 versions: crates/cluster/ssz, tree_hash and ssz.
  • There are some additional unused dependencies.

I don't want to block on this since it has very little business logic (it's mostly moving data around) but still I would like to evaluate:

WDYT @varex83 ?

Cargo.toml Outdated
Comment on lines +92 to +93
ssz = { package = "ethereum_ssz", version = "0.10.1" }
ssz_derive = { package = "ethereum_ssz_derive", version = "0.10.1" }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have tree_hash to deal with ssz. Why do we need to include a second library for the same features? Also, including this dependencies forces us to add typenum too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed ssz and typenum. Self-implement a lightweight BitList and BitVector

use serde::{Deserialize, Serialize, de::Error as DeError};
use tree_hash::{Hash256, PackedEncoding, TreeHash, TreeHashType, merkle_root, mix_in_length};

fn tree_hash_bytes<T: TreeHash>(values: &[T]) -> Vec<u8> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks to be the same as pluto_cluster::ssz_hasher::Hasher::put_bytes

/// Consensus data version.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, Default, Serialize, Deserialize)]
#[serde(rename_all = "lowercase")]
pub enum DataVersion {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same as pluto_eth2util::types::DataVersion except for the custom Serialize/Deserialize. Same thing for BuilderVersion.

I see that the difference is on the JSON serialization. What exactly is expected by the Beacon API? From what I could tell it uses the enum lowercase names as strings.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The different between pluto_eth2util::types::DataVersion and eth2api::spec::version is:

  • pluto_eth2util::types::DataVersion uses number as serialization
  • eth2api::spec::version uses string

pub signature: phase0::BLSSignature,
}

impl Serialize for SignedEpoch {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can replace some of these custom instances with serde annotations.

@iamquang95 iamquang95 force-pushed the iamquang95/core/signeddata branch from e056798 to ba8d6b7 Compare March 11, 2026 11:58
/// values.
pub struct Hex0x;

impl<T> SerializeAs<T> for Hex0x
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have something similar in the cluster

Copy link
Collaborator

@varex83 varex83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @emlautarom1 said its very hard to review this PR, and we already talked in the slack regarding it.

I would suggest using the approach with autogenerating the types together with some ssz derives (most likely I feel we will need to make our own, but not sure)

@iamquang95
Copy link
Collaborator Author

@varex83 @emlautarom1 I removed the ssz, ssz_derive, and typenum dependencies. There is currently some duplicated code related to SSZ, but the active implementation lives in pluto_cluster. The next refactor step will be to move the core SSZ-related logic into a new pluto_ssz crate so it can serve as the centralized place for SSZ functionality.

For code generation, the structs are implemented based on the specs here:
https://github.com/ethereum/consensus-specs/blob/master/specs

So far, I haven't found an efficient way to generate them automatically. As discussed, one possible approach is to reuse the structs from Alloy. The challenge is that we would still need to reimplement tree_hash, since those structs do not derive TreeHash.

The golden test data comes from Charon. For some additional test fixtures, I used a Go library to generate the structs and corresponding hashes, and then used those outputs as test cases for the Rust implementation.

@varex83
Copy link
Collaborator

varex83 commented Mar 13, 2026

@claude can you review the PR?

@claude
Copy link

claude bot commented Mar 13, 2026

Claude finished @varex83's task in 2m 45s —— View job


PR Review

  • Gather context and understand request
  • Read key changed files
  • Review core/signeddata.rs
  • Review eth2api spec types and SSZ helpers
  • Review eth2util/types.rs
  • Review test fixtures and golden tests
  • Post comprehensive review

Summary

This PR ports the Go core/signeddata module to Rust, adding fork-aware signed-data wrappers (proposal, attestation, voluntary exit, RANDAO, etc.), consensus-spec struct types for Phase0–Fulu, and SSZ/TreeHash helpers. The implementation approach is sound and the test coverage is reasonable. The most pressing concerns are a handful of unreachable! macro calls in production code (which technically violates workspace policy), an inconsistent error variant in one path, redundant .to_vec() calls, and unclear sourcing for some golden test vectors.


Findings

[High] unreachable! macros in production code

crates/eth2api/src/spec/ssz_types.rs has unreachable!("List should never be packed.") and friends in tree_hash_packed_encoding and tree_hash_packing_factor for SszList, SszVector, BitList, and BitVector. These are never called in practice (the tree_hash crate only calls them for Basic-typed values), but they panic at runtime if ever triggered and violate the unwrap_used = "deny" / no-panics rule from AGENTS.md.

Evidence: crates/eth2api/src/spec/ssz_types.rs:79, :83, :147, :151, :279, :283, :347, :351

Recommendation: Replace with a panic-free contract — return a no-op PackedEncoding for the encoding method and 1 for the packing factor, and add a doc comment that these should never be reached. Alternatively, match the tree_hash ecosystem convention which documents this pattern as intentional. Either way, a lint #[allow(clippy::unreachable)] with an explicit comment explaining why it is safe is more compliant with the workspace policy than a bare unreachable!. Fix this →


[Medium] Inconsistent error variant in VersionedAttestation::set_signature

VersionedAttestation::set_signature returns SignedDataError::UnknownAttestationVersion for the unknown-version branch (line ~459), but signature() and message_root() on the same type return SignedDataError::UnknownVersion. The Go reference always uses a consistent error for the unknown-version guard. This divergence makes callers harder to reason about and breaks parity.

Evidence: crates/core/src/signeddata.rs:459 vs :444 and :471

Recommendation: Use SignedDataError::UnknownVersion consistently. Fix this →


[Medium] Redundant .to_vec() after BitList::into_bytes()

BitList::into_bytes() already returns Vec<u8>. Calling .to_vec() on the result is a redundant no-op clone of an already-owned vector.

Evidence: crates/eth2api/src/versioned.rs:229 (and :317)

Recommendation: Remove the redundant .to_vec() calls. Fix this →


[Medium] Three partially-overlapping DataVersion types

There are now three version enums in the workspace:

  • eth2api::spec::version::DataVersion — string serialization ("phase0", etc.)
  • eth2util::types::DataVersion — numeric serialization (legacy go-eth2-client format, used for inter-node wire format)
  • versioned::DataVersion — a re-export of the eth2api variant

The conversion methods (to_eth2 / from_eth2) between these are correct, but the three-way split creates maintenance burden (all three must be kept in sync when a new fork is added) and confused past reviewers. This echoes the concern raised by @emlautarom1.

The PR description explains the rationale (numeric vs. string serialization formats), which is valid. However, consider whether this can be collapsed into one type with two Serialize/Deserialize configurations using serde attributes or a serde_as adapter, rather than two separate enum definitions.


[Medium] Golden test vector sources are undocumented in the file

crates/core/testdata/signed_data_golden.json and crates/eth2api/src/test_fixtures.rs contain hardcoded hashes and JSON blobs without embedded attribution. The PR description mentions they come from Charon or a custom Go harness, but a future reader cannot verify this from the code alone.

Recommendation: Add a short header comment in both files with the provenance: e.g., // Generated by <script> against charon@v1.7.1 — see <path-to-script> or at minimum note the source in a companion testdata/README.md.


[Low] FuluBlinded uses electra::SignedBlindedBeaconBlock

In both versioned::SignedProposalBlock::FuluBlinded and versioned::SignedBlindedProposalBlock::Fulu, the payload type is electra::SignedBlindedBeaconBlock. This is intentional (Fulu reuses Electra block structures), but the inconsistency with versioned::SignedProposalBlock::Fulu (which uses fulu::SignedBlockContents) is surprising. A comment explaining why Fulu blinded reuses the Electra type would prevent future confusion.

Evidence: crates/eth2api/src/versioned.rs:49, :130


[Low] Missing test coverage for Deneb (unblinded) in the golden roundtrip tests

Looking at versioned_signed_proposal test cases (line ~2469), Deneb with blinded=true is not listed as a test case even though both unblinded and blinded Deneb paths are implemented. Similarly, Fulu, false is missing. This leaves deserialization paths untested.

Evidence: crates/core/src/signeddata.rs:2469–2478


Parity Matrix

Component Go (charon v1.7.1) Rust Match Notes
VersionedSignedProposal serde version numeric + blinded flag version numeric + blinded flag yes
VersionedAttestation serde version numeric, validator_index string same yes
SignedRandao message root hash_tree_root(epoch) hash_root(&self.0) (full SignedEpoch) gap See note below
BeaconCommitteeSelection message root hash_tree_root(slot) hash_root(&self.0.slot) yes
SignedAggregateAndProof message root hash_tree_root(aggregate_and_proof) hash_root(&payload.message) yes Signs AggregateAndProof, not the outer signed wrapper
Signature serde base64 base64 yes
DataVersion wire (inter-node) numeric u64 numeric u64 via eth2util::types yes

Note on SignedRandao message root: In Go, the RANDAO message root is hash_tree_root(epoch) (just the uint64 epoch). The Rust implementation hashes the full SignedEpoch struct (which has #[tree_hash(skip_hashing)] on signature), effectively hashing epoch only — so this does match. However, this is non-obvious and deserves a comment.


Tests

Tests are present and cover the main happy-path and error cases well. The golden-vector approach is effective for detecting tree-hash divergence. Gaps noted above (Deneb unblinded/Fulu unblinded roundtrip, and vector provenance) are the main issues.


Open Questions / Assumptions

  1. Are there any validators (e.g., length, domain type) missing from the VersionedAttestation::new constructor that the Go version enforces?
  2. The SszList<T, 0> case in tree_hash_root treats MAX == 0 as "unbounded" (minimum_leaf_count = 0). Is this the intended sentinel value, or should MAX = 0 be disallowed at the type level?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement core/signneddata

3 participants